<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>Preface | Data Science at the Command Line, 2e</title>
<meta name="author" content="Jeroen Janssens">
<meta name="description" content="Data science is an exciting field to work in. It’s also still relatively young. Unfortunately, many people, and many companies as well, believe that you need new technology to tackle the problems...">
<meta name="generator" content="bookdown 0.24 with bs4_book()">
<meta property="og:title" content="Preface | Data Science at the Command Line, 2e">
<meta property="og:type" content="book">
<meta property="og:url" content="https://datascienceatthecommandline.com/preface.html">
<meta property="og:image" content="https://datascienceatthecommandline.com/og.png">
<meta property="og:description" content="Data science is an exciting field to work in. It’s also still relatively young. Unfortunately, many people, and many companies as well, believe that you need new technology to tackle the problems...">
<meta name="twitter:card" content="summary_large_image">
<meta name="twitter:title" content="Preface | Data Science at the Command Line, 2e">
<meta name="twitter:description" content="Data science is an exciting field to work in. It’s also still relatively young. Unfortunately, many people, and many companies as well, believe that you need new technology to tackle the problems...">
<meta name="twitter:image" content="https://datascienceatthecommandline.com/twitter.png">
<!-- JS --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.6/clipboard.min.js" integrity="sha256-inc5kl9MA1hkeYUt+EC3BhlIgyp/2jDIyBLS6k3UxPI=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/fuse.js/6.4.6/fuse.js" integrity="sha512-zv6Ywkjyktsohkbp9bb45V6tEMoWhzFzXis+LrMehmJZZSys19Yxf1dopHx7WzIKxr5tK2dVcYmaCk2uqdjF4A==" crossorigin="anonymous"></script><script src="https://kit.fontawesome.com/6ecbd6c532.js" crossorigin="anonymous"></script><script src="libs/header-attrs-2.9/header-attrs.js"></script><script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link href="libs/bootstrap-4.6.0/bootstrap.min.css" rel="stylesheet">
<script src="libs/bootstrap-4.6.0/bootstrap.bundle.min.js"></script><link href="libs/_Source%20Sans%20Pro-0.4.0/font.css" rel="stylesheet">
<link href="https://fonts.googleapis.com/css2?family=Fira%20Mono:wght@400;600&amp;display=swap" rel="stylesheet">
<script src="libs/bs3compat-0.3.1/transition.js"></script><script src="libs/bs3compat-0.3.1/tabs.js"></script><script src="libs/bs3compat-0.3.1/bs3compat.js"></script><link href="libs/bs4_book-1.0.0/bs4_book.css" rel="stylesheet">
<script src="libs/bs4_book-1.0.0/bs4_book.js"></script><link rel="apple-touch-icon" sizes="180x180" href="/apple-touch-icon.png">
<link rel="icon" type="image/png" sizes="32x32" href="/favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="/favicon-16x16.png">
<link rel="manifest" href="/site.webmanifest">
<link rel="mask-icon" href="/safari-pinned-tab.svg" color="#d42d2d">
<meta name="apple-mobile-web-app-title" content="Data Science at the Command Line">
<meta name="application-name" content="Data Science at the Command Line">
<meta name="msapplication-TileColor" content="#b91d47">
<meta name="theme-color" content="#ffffff">
<script>
      (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
      })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
      ga('create', 'UA-43246574-3', 'auto');
      ga('send', 'pageview');
    </script><script src="https://cdnjs.cloudflare.com/ajax/libs/autocomplete.js/0.38.0/autocomplete.jquery.min.js" integrity="sha512-GU9ayf+66Xx2TmpxqJpliWbT5PiGYxpaG8rfnBEk1LL8l1KGkRShhngwdXK1UgqhAzWpZHSiYPc09/NwDQIGyg==" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mark.js/8.11.1/mark.min.js" integrity="sha512-5CYOlHXGh6QpOFA/TeTylKLWfB3ftPsde7AnmhuitiTX4K5SqCLBeKro6sPS8ilsz1Q4NRx3v8Ko2IBiszzdww==" crossorigin="anonymous"></script><!-- CSS --><link rel="stylesheet" href="dsatcl2e.css">
</head>
<body data-spy="scroll" data-target="#toc">

<div class="container-fluid">
<div class="row">
  <header class="col-sm-12 col-lg-2 sidebar sidebar-book"><a class="sr-only sr-only-focusable" href="#content">Skip to main content</a>

    <div class="d-flex align-items-start justify-content-between">
      <img id="cover" class="d-none d-lg-block" src="images/cover-small.png"><h1 class="d-lg-none">
        <a href="index.html" title="">Data Science at the Command Line, 2e</a>
      </h1>
      <button class="btn btn-outline-primary d-lg-none ml-2 mt-1" type="button" data-toggle="collapse" data-target="#main-nav" aria-expanded="true" aria-controls="main-nav"><i class="fas fa-bars"></i><span class="sr-only">Show table of contents</span></button>
    </div>

    <div id="main-nav" class="collapse-lg">
      <form role="search">
        <input id="search" class="form-control" type="search" placeholder="Search" aria-label="Search">
</form>
      <nav aria-label="Table of contents"><h2>Table of contents</h2>
        <ul class="book-toc list-unstyled">
<li><a class="" href="index.html">Welcome</a></li>
<li><a class="" href="foreword.html">Foreword</a></li>
<li><a class="active" href="preface.html">Preface</a></li>
<li><a class="" href="chapter-1-introduction.html"><span class="header-section-number">1</span> Introduction</a></li>
<li><a class="" href="chapter-2-getting-started.html"><span class="header-section-number">2</span> Getting Started</a></li>
<li><a class="" href="chapter-3-obtaining-data.html"><span class="header-section-number">3</span> Obtaining Data</a></li>
<li><a class="" href="chapter-4-creating-command-line-tools.html"><span class="header-section-number">4</span> Creating Command-line Tools</a></li>
<li><a class="" href="chapter-5-scrubbing-data.html"><span class="header-section-number">5</span> Scrubbing Data</a></li>
<li><a class="" href="chapter-6-project-management-with-make.html"><span class="header-section-number">6</span> Project Management with Make</a></li>
<li><a class="" href="chapter-7-exploring-data.html"><span class="header-section-number">7</span> Exploring Data</a></li>
<li><a class="" href="chapter-8-parallel-pipelines.html"><span class="header-section-number">8</span> Parallel Pipelines</a></li>
<li><a class="" href="chapter-9-modeling-data.html"><span class="header-section-number">9</span> Modeling Data</a></li>
<li><a class="" href="chapter-10-polyglot-data-science.html"><span class="header-section-number">10</span> Polyglot Data Science</a></li>
<li><a class="" href="chapter-11-conclusion.html"><span class="header-section-number">11</span> Conclusion</a></li>
<li><a class="" href="list-of-command-line-tools.html">List of Command-Line Tools</a></li>
</ul>

        <div class="book-extra">
          <p><a id="book-repo" href="https://github.com/jeroenjanssens/data-science-at-the-command-line">View book repository <i class=""></i></a></p>
        </div>

        <div>
          <a id="course-signup" href="/#course">Embrace the Command Line</a>
        </div>
      </nav>
</div>
  </header><main class="col-sm-12 col-md-9 col-lg-7" id="content"><div id="preface" class="section level1 unnumbered">
<h1>Preface<a class="anchor" aria-label="anchor" href="#preface"><i class="fas fa-link"></i></a>
</h1>
<p>Data science is an exciting field to work in.
It’s also still relatively young.
Unfortunately, many people, and many companies as well, believe that you need new technology to tackle the problems posed by data science.
However, as this book demonstrates, many things can be accomplished by using the command line instead, and sometimes in a much more efficient way.</p>
<p>During my PhD program, I gradually switched from using Microsoft Windows to using Linux.
Because this transition was a bit scary at first, I started with having both operating systems installed next to each other (known as a dual-boot).
The urge to switch back and forth between Microsoft Windows and Linux eventually faded, and at some point I was even tinkering around with Arch Linux, which allows you to build up your own custom Linux machine from scratch.
All you’re given is the command line, and it’s up to you what to make of it.
Out of necessity, I quickly became very comfortable using the command line.
Eventually, as spare time got more precious, I settled down with a Linux distribution known as Ubuntu because of its ease of use and large community.
However, the command line is still where I’m spending most of my time.</p>
<p>It actually wasn’t too long ago that I realized that the command line is not just for installing software, configuring systems, and searching files.
I started learning about tools such as <code>cut</code>, <code>sort</code>, and <code>sed</code>.
These are examples of command-line tools that take data as input, do something to it, and print the result.
Ubuntu comes with quite a few of them.
Once I understood the potential of combining these small tools, I was hooked.</p>
<p>After earning my PhD, when I became a data scientist, I wanted to use this approach to do data science as much as possible.
Thanks to a couple of new, open source command-line tools including <code>xml2json</code>, <code>jq</code>, and <code>json2csv</code>, I was even able to use the command line for tasks such as scraping websites and processing lots of JSON data.</p>
<p>In September 2013, I decided to write a blog post titled <a href="http://www.jeroenjanssens.com/2013/09/19/seven-command-line-tools-for-data-science.html">Seven Command-line Tools for Data Science</a>.
To my surprise, the blog post got quite some attention, and I received a lot of suggestions of other command-line tools.
I started wondering whether the blog post could be turned into a book.
I was pleased that, some 10 months later, and with the help of many talented people (see the acknowledgments), the answer was yes.</p>
<p>I am sharing this personal story not so much because I think you should know how this book came about, but because I want to you know that I had to learn about the command line as well.
Because the command line is so different from using a graphical user interface, it can seem scary at first.
But if I could learn it, then you can as well.
No matter what your current operating system is and no matter how you currently work with data, after reading this book you will be able to do data science at the command line.
If you’re already familiar with the command line, or even if you’re already dreaming in shell scripts, chances are that you’ll still discover a few interesting tricks or command-line tools to use for your next data science project.</p>
<div id="what-to-expect-from-this-book" class="section level2 unnumbered">
<h2>What to Expect from This Book<a class="anchor" aria-label="anchor" href="#what-to-expect-from-this-book"><i class="fas fa-link"></i></a>
</h2>
<p>In this book, we’re going to obtain, scrub, explore, and model data—a lot of it.
This book is not so much about how to become <em>better</em> at those data science tasks.
There are already great resources available that discuss, for example, when to apply which statistical test or how data can best be visualized.
Instead, this practical book aims to make you more <em>efficient</em> and <em>productive</em> by teaching you how to perform those data science tasks at the command line.</p>
<p>While this book discusses more than 90 command-line tools, it’s not the tools themselves that matter most.
Some command-line tools have been around for a very long time, while others will be replaced by better ones.
New command-line tools are being created even as you’re reading this.
Over the years, I have discovered many amazing command-line tools.
Unfortunately, some of them were discovered too late to be included in the book.
In short, command-line tools come and go.
But that’s OK.</p>
<p>What matters most is the underlying idea of working with tools, pipes, and data.
Most command-line tools do one thing and do it well.
This is part of the Unix philosophy, which makes several appearances throughout the book.
Once you have become familiar with the command line, know how to combine command-line tools, and can even create new ones, you have developed an invaluable skill.</p>
</div>
<div id="changes-for-the-second-edition" class="section level2 unnumbered">
<h2>Changes for the Second Edition<a class="anchor" aria-label="anchor" href="#changes-for-the-second-edition"><i class="fas fa-link"></i></a>
</h2>
<p>While the command line as a technology and as a way of working is timeless, some of the tools discussed in the first edition have either been superseded by newer tools (e.g., <code>csvkit</code> has largely been replaced by <code>xsv</code>) or abandoned by their developers (e.g., <code>drake</code>), or they’ve been suboptimal choices (e.g., <code>weka</code>).
I have learned a lot since the first edition was published in October 2014, either through my own experience or as a result of the useful feedback from my readers.
Even though the book is quite niche because it lies at the intersection of two subjects, there remains a steady interest from the data science community, as evidenced by the many positive messages I receive almost every day.
By updating the first edition, I hope to keep the book relevant for at least another five years.
Here’s a nonexhaustive list of changes I have made:</p>
<ul>
<li>I replaced <code>csvkit</code> with <code>xsv</code> as much as possible. <code>xsv</code> is a much faster alternative to working with CSV files.</li>
<li>In Section 2.2 and 3.2 I replaced the VirtualBox image with a Docker image. Docker is a faster and more lightweight way of running an isolated environment than VirtualBox.</li>
<li>I now use <code>pup</code> instead of <code>scrape</code> to work with HTML. <code>scrape</code> is a Python tool I created myself. <code>pup</code> is much faster, has more features, and is easier to install.</li>
<li>
<a href="chapter-6-project-management-with-make.html#chapter-6-project-management-with-make">Chapter 6</a> has been rewritten from scratch. Instead of <code>drake</code> I now use <code>make</code> to do project management. <code>drake</code> is no longer maintained and <code>make</code> is much more mature and very popular with developers.</li>
<li>I replaced <code>Rio</code> with <code>rush</code>. <code>Rio</code> is a clunky Bash script I created myself. <code>rush</code> is an R package that is a much more stable and flexible way of using R from the command line.</li>
<li>In <a href="chapter-9-modeling-data.html#chapter-9-modeling-data">Chapter 9</a> I replaced Weka and BigML with Vowpal Wabbit (<code>vw</code>). Weka is old and the way it is used from the command line is clunky. BigML is a commercial API on which I no longer want to rely. Vowpal Wabbit is a very mature machine learning tool, developed at Yahoo! and now at Microsoft.</li>
<li>
<a href="chapter-10-polyglot-data-science.html#chapter-10-polyglot-data-science">Chapter 10</a> is an entirely new chapter about integrating the command line into existing workflows, including Python, R, and Apache Spark. In the first edition I mentioned that the command line can easily be integrated with existing workflows, but I never got into that. This chapter fixes that.</li>
</ul>
</div>
<div id="how-to-read-this-book" class="section level2 unnumbered">
<h2>How to Read This Book<a class="anchor" aria-label="anchor" href="#how-to-read-this-book"><i class="fas fa-link"></i></a>
</h2>
<p>In general, I advise you to read this book in a linear fashion.
Once a concept or command-line tool has been introduced, chances are that I employ it in a later chapter.
For example, in <a href="chapter-9-modeling-data.html#chapter-9-modeling-data">Chapter 9</a>, I make heavy use of <code>parallel</code>, which is introduced extensively in <a href="chapter-8-parallel-pipelines.html#chapter-8-parallel-pipelines">Chapter 8</a>.</p>
<p>Data science is a broad field that intersects many other fields such as programming, data visualization, and machine learning.
As a result, this book touches on many interesting topics which unfortunately cannot be discussed at full length.
Throughout the book, at the end of each chapter, there are suggestions for further exploration.
It’s not required to read this material in order to follow along with the book, but when you are interested, you know that there’s much more to learn.</p>
</div>
<div id="who-this-book-is-for" class="section level2 unnumbered">
<h2>Who This Book Is For<a class="anchor" aria-label="anchor" href="#who-this-book-is-for"><i class="fas fa-link"></i></a>
</h2>
<p>This book makes just one assumption about you: that you work with data.
It doesn’t matter which programming language or statistical computing environment you’re currently using.
The book explains all the necessary concepts from the beginning.</p>
<p>It also doesn’t matter whether your operating system is Microsoft Windows, macOS, or some flavor of Linux.
The book comes with a Docker image, which is an easy-to-install virtual environment.
It allows you to run the command-line tools and follow along with the code examples in the same environment as this book was written.
You don’t have to waste time figuring out how to install all the command-line tools and their dependencies.</p>
<p>The book contains some code in Bash, Python, and R, so it’s helpful if you have some programming experience, but it’s by no means required to follow along with the examples.</p>
</div>
<div id="conventions-used-in-this-book" class="section level2 unnumbered">
<h2>Conventions Used in This Book<a class="anchor" aria-label="anchor" href="#conventions-used-in-this-book"><i class="fas fa-link"></i></a>
</h2>
<p>The following typographical conventions are used in this book:</p>
<dl>
<dt><em>Italic</em></dt>
<dd>
<p>Indicates new terms, URLs, directory names, and filenames.</p>
</dd>
<dt><code>Constant width</code></dt>
<dd>
<p>Used for code and commands, as well as within paragraphs to refer to command-line tools and their options.</p>
</dd>
<dt><strong><code>Constant width bold</code></strong></dt>
<dd>
<p>Shows commands or other text that should be typed literally by the user.</p>
</dd>
</dl>
<div class="rmdtip">
This element signifies a tip or suggestion.
</div>

<div class="rmdnote">
This element signifies a general note.
</div>

<div class="rmdcaution">
This element indicates a warning or caution.
</div>
<!--A
=== Using Code Examples

Supplemental material (code examples, exercises, etc.) is available for download at link:$$https://github.com/jeroenjanssens/data-science-at-the-command-line$$[].

This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Data Science at the Command Line by Jeroen Janssens (O’Reilly). Copyright 2021 Jeroen Janssens, 978-1-492-08791-5.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at pass:[<a class="email" href="mailto:permissions@oreilly.com"><em>permissions@oreilly.com</em></a>].

=== O'Reilly Online Learning

[role = "ormenabled"]
[NOTE]
====
For more than 40 years, pass:[<a href="http://oreilly.com" class="orm:hideurl"><em class="hyperlink">O’Reilly Media</em></a>] has provided technology and business training, knowledge, and insight to help companies succeed.
====

Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O'Reilly and 200+ other publishers. For more information, visit pass:[<a href="http://oreilly.com" class="orm:hideurl"><em>http://oreilly.com</em></a>].

=== How to Contact Us

Please address comments and questions concerning this book to the publisher:

++++
<ul class="simplelist">
  <li>O’Reilly Media, Inc.</li>
  <li>1005 Gravenstein Highway North</li>
  <li>Sebastopol, CA 95472</li>
  <li>800-998-9938 (in the United States or Canada)</li>
  <li>707-829-0515 (international or local)</li>
  <li>707-829-0104 (fax)</li>
</ul>
++++

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at link:$$https://wwww.datascienceatthecommandline.com$$[].

Email pass:[<a class="email" href="mailto:bookquestions@oreilly.com"><em>bookquestions@oreilly.com</em></a>] to comment or ask technical questions about this book.

For news and information about our books and courses, visit link:$$http://oreilly.com$$[].

Find us on Facebook: link:$$http://facebook.com/oreilly$$[]

Follow us on Twitter: link:$$http://twitter.com/oreillymedia$$[]

Watch us on YouTube: link:$$http://www.youtube.com/oreillymedia$$[]



A-->
</div>
<div id="acknowledgments" class="section level2 unnumbered">
<h2>Acknowledgments<a class="anchor" aria-label="anchor" href="#acknowledgments"><i class="fas fa-link"></i></a>
</h2>
<div id="acknowledgments-for-the-second-edition-2021" class="section level3 unnumbered">
<h3>Acknowledgments for the Second Edition (2021)<a class="anchor" aria-label="anchor" href="#acknowledgments-for-the-second-edition-2021"><i class="fas fa-link"></i></a>
</h3>
<p>Seven years have passed since the first edition came out.
During this time, and especially during the last 13 months, many people have helped me.
Without them, I would have never been able to write a second edition.</p>
<p>I was once again blessed with three wonderful editors at O’Reilly.
I would like to thank Sarah “Embrace the deadline” Grey, Jess “Pedal to the metal” Haberman, and Kate “Let it go” Galloway. Their middle names say it all. With their incredible help, I was able to embrace the deadlines, put the pedal to metal when it mattered, and eventually let it go.
I’d also like to thank their colleagues Angela Rufino, Arthur Johnson, Cassandra Furtado, David Futato, Helen Monroe, Karen Montgomery, Kate Dullea, Kristen Brown, Marie Beaugureau, Marsee Henon, Nick Adams, Regina Wilkinson, Shannon Cutt, Shannon Turlington, and Yasmina Greco, for making the collaboration with O’Reilly such a pleasure.</p>
<p>Despite having an automated process to execute the code and paste back the results (thanks to R Markdown and Docker), the number of mistakes I was able to make is impressive.
Thank you Aaditya Maruthi, Brian Eoff, Caitlin Hudon, Julia Silge, Mike Dewar, and Shane Reustle for reducing this number immensely.
Of course, any mistakes left are my responsibility.</p>
<p>Marc Canaleta deserves a special thank you.
In October 2014, shortly after the first edition came out, Marc invited me to give a one-day workshop about <em>Data Science at the Command Line</em> to his team at Social Point in Barcelona.
Little did we both know that many workshops would follow.
It eventually led me to start my own company: Data Science Workshops.
Every time I teach, I learn something new.
They probably don’t know it, but each student has had an impact, in one way or another, on this book.
To them I say: thank you.
I hope I can teach for a very long time.</p>
<p>Captivating conversations, splendid suggestions, and passionate pull requests.
I greatly appreciate each and every contribution by following generous people:
Adam Johnson,
Andre Manook,
Andrea Borruso,
Andres Lowrie,
Andrew Berisha,
Andrew Gallant,
Andrew Sanchez,
Anicet Ebou,
Anthony Egerton,
Ben Isenhart,
[.keep-together]#Chris Wiggins#,
Chrys Wu,
Dan Nguyen,
Darryl Amatsetam,
Dmitriy Rozhkov,
Doug Needham,
Edgar Manukyan,
Erik Swan,
Felienne Hermans,
George Kampolis,
Giel van Lankveld,
Greg Wilson,
Hay Kranen,
Ioannis Cherouvim,
Jake Hofman,
Jannes Muenchow,
Jared Lander,
Jay Roaf,
Jeffrey Perkel,
Jim Hester,
Joachim Hagege,
Joel Grus,
John Cook,
John Sandall,
Joost Helberg,
Joost van Dijk,
Joyce Robbins,
Julian Hatwell,
Karlo Guidoni,
Karthik Ram,
Lissa Hyacinth,
Longhow Lam,
Lui Pillmann,
Lukas Schmid,
Luke Reding,
Maarten van Gompel,
Martin Braun,
Max Schelker,
Max Shron,
Nathan Furnal,
Noah Chase,
Oscar Chic,
Paige Bailey,
Peter Saalbrink,
Rich Pauloo,
Richard Groot,
Rico Huijbers,
Rob Doherty,
Robbert van Vlijmen,
Russell Scudder,
Sylvain Lapoix,
TJ Lavelle,
Tan Long,
Thomas Stone,
Tim O’Reilly,
Vincent Warmerdam, and
Yihui Xie.</p>
<p>Throughout this book, and especially in the footnotes and appendix, you’ll find hundreds of names.
These names belong to the authors of the many tools, books, and other resources on which this book stands.
I’m incredibly grateful for their hard work, regardless of whether that work was done 50 years or 50 days ago.</p>
<p>Above all, I would like to thank my wife Esther, my daughter Florien, and my son Olivier for reminding me daily what truly matters.
I promise it’ll be a few years before I start writing the third edition.</p>
</div>
<div id="acknowledgments-for-the-first-edition-2014" class="section level3 unnumbered">
<h3>Acknowledgments for the First Edition (2014)<a class="anchor" aria-label="anchor" href="#acknowledgments-for-the-first-edition-2014"><i class="fas fa-link"></i></a>
</h3>
<p>First of all, I’d like to thank Mike Dewar and Mike Loukides for believing that my blog post <a href="http://jeroenjanssens.com/2013/09/19/seven-command-line-tools-for-data-science.html">Seven Command-Line Tools for Data Science</a>, which I wrote in September 2013, could be expanded into a book.</p>
<p>Special thanks to my technical reviewers Mike Dewar, Brian Eoff, and Shane Reustle for reading various drafts, meticulously testing all the commands, and providing invaluable feedback.
Your efforts have improved the book greatly.
The remaining errors are entirely my own responsibility.</p>
<p>I had the privilege of working together with three amazing editors, namely: Ann Spencer, Julie Steele, and Marie Beaugureau.
Thank you for your guidance and for being such great liaisons with the many talented people at O’Reilly.
Those people include: Laura Baldwin, Huguette Barriere, Sophia DeMartini, Yasmina Greco, Rachel James, Ben Lorica, Mike Loukides, and Christopher Pappas.
There are many others whom I haven’t met because they are operating behind the scenes.
Together they ensured that working with O’Reilly has truly been a pleasure.</p>
<p>This book discusses over 80 command-line tools.
Needless to say, without these tools, this book wouldn’t have existed in the first place.
I’m therefore extremely grateful to all the authors who created and contributed to these tools.
The complete list of authors is unfortunately too long to include here; they are mentioned in the Appendix.
Thanks especially to Aaron Crow, Jehiah Czebotar, Christoph Groskopf, Dima Kogan, Sergey Lisitsyn, Francisco J.
Martin, and Ole Tange for providing help with their amazing command-line tools.</p>
<p>Eric Postma and Jaap van den Herik, who supervised me during my PhD program, deserve a special thank you.
Over the course of five years they have taught me many lessons.
Although writing a technical book is quite different from writing a PhD thesis, many of those lessons proved to be very helpful in the past nine months as well.</p>
<p>Finally, I’d like to thank my colleagues at YPlan, my friends, my family, and especially my wife Esther for supporting me and for pulling me away from the command line at just the right times.</p>

<!--A[role="pagenumrestart"]
A-->
</div>
</div>
</div>
  <div class="chapter-nav">
<div class="prev"><a href="foreword.html">Foreword</a></div>
<div class="next"><a href="chapter-1-introduction.html"><span class="header-section-number">1</span> Introduction</a></div>
</div></main><div class="col-md-3 col-lg-3 d-none d-md-block sidebar sidebar-chapter">
    <nav id="toc" data-toggle="toc" aria-label="On this page"><h2>On this page</h2>
      <ul class="nav navbar-nav">
<li><a class="nav-link" href="#preface">Preface</a></li>
<li><a class="nav-link" href="#what-to-expect-from-this-book">What to Expect from This Book</a></li>
<li><a class="nav-link" href="#changes-for-the-second-edition">Changes for the Second Edition</a></li>
<li><a class="nav-link" href="#how-to-read-this-book">How to Read This Book</a></li>
<li><a class="nav-link" href="#who-this-book-is-for">Who This Book Is For</a></li>
<li><a class="nav-link" href="#conventions-used-in-this-book">Conventions Used in This Book</a></li>
<li>
<a class="nav-link" href="#acknowledgments">Acknowledgments</a><ul class="nav navbar-nav">
<li><a class="nav-link" href="#acknowledgments-for-the-second-edition-2021">Acknowledgments for the Second Edition (2021)</a></li>
<li><a class="nav-link" href="#acknowledgments-for-the-first-edition-2014">Acknowledgments for the First Edition (2014)</a></li>
</ul>
</li>
</ul>

      <div class="book-extra">
        <ul class="list-unstyled">
<li><a id="book-source" href="https://github.com/jeroenjanssens/data-science-at-the-command-line/blob/master/book/2e/preface.Rmd">View source <i class=""></i></a></li>
          <li><a id="book-edit" href="https://github.com/jeroenjanssens/data-science-at-the-command-line/edit/master/book/2e/preface.Rmd">Edit this page <i class=""></i></a></li>
        </ul>
</div>
    </nav>
</div>

</div>
</div> <!-- .container -->

<footer class="bg-primary text-light mt-5"><div class="container-fluid">
    <div class="row">
      <div class="d-none d-lg-block col-lg-2 sidebar"></div>
      <div class="col-sm-12 col-md-9 col-lg-7 mt-3" style="max-width: 45rem;">
        <p><strong>Data Science at the Command Line, 2e</strong> by <a href="https://twitter.com/jeroenhjanssens" class="text-light">Jeroen Janssens</a>. Updated on December 14, 2021. This book was built by the <a class="text-light" href="https://bookdown.org">bookdown</a> R package.</p>
      </div>
      <div class="col-md-3 col-lg-3 d-none d-md-block sidebar"></div>
    </div>
  </div>
</footer>
</body>
</html>
